NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Enhancing Program Analysis with Deterministic Distinguishable Calling Context

https://doi.org/10.1145/3708493.3712679

Kim, Sungkeun; Nguyen, Khanh; Tsai, Chia-Che; Lee, Jaewoo; Muzahid, Abdullah; Kim, Eun Jung (February 2025, ACM)

Calling context is crucial for improving the precision of program analyses in various use cases (clients), such as profiling, debugging, optimization, and security checking. Often the calling context is encoded using a numerical value. We have observed that many clients benefit not only from a deterministic but also globally distinguishable value across runs to simplify bookkeeping and guarantee complete uniqueness. However, existing work only guarantees determinism, not global distinguishability. Clients need to develop auxiliary helpers, which incurs considerable overhead to distinguish encoded values among all calling contexts. In this paper, we propose Deterministic Distinguishable Calling Context Encoding () that can enable both properties of calling context encoding natively. The key idea of is leveraging the static call graph and encoding each calling context as the running call path count. Thereby, a mapping is established statically and can be readily used by the clients. Our experiments with two client tools show that has a comparable overhead compared to two state-of-the-art encoding schemes, PCCE and PCC, and further avoids the expensive overheads of collision detection, up to 2.1× and 50%, for Splash-3 and SPEC CPU 2017, respectively.
more » « less
Free, publicly-accessible full text available February 25, 2026
Customizing Cache Indexing Through Entropy Estimation

https://doi.org/10.1109/MICRO61859.2024.00041

Weston, Kevin; Johnson, Avery; Janfaza, Vahid; Mahmud, Farabi; Muzahid, Abdullah (November 2024, IEEE)

Full Text Available
Attack of the Knights:Non Uniform Cache Side Channel Attack

https://doi.org/10.1145/3627106.3627199

Mahmud, Farabi; Kim, Sungkeun; Chawla, Harpreet Singh; Kim, Eun Jung; Tsai, Chia-Che; Muzahid, Abdullah (December 2023, ACM)
Post-Silicon Customization Using Deep Neural Networks

https://doi.org/10.1007/978-3-031-42785-5_9

Weston, Kevin; Janfaza, Vahid; Taur, Abhishek; Mungra, Dhara; Kansal, Arnav; Zahran, Mohammed; Muzahid, Abdullah (June 2023, International Conference on Architecture of Computing Ssstems)

Full Text Available
SmartIndex: Learning to Index Caches to Improve Performance

https://doi.org/10.1109/LCA.2023.3264478

Weston, Kevin; Mahmud, Farabi; Janfaza, Vahid; Muzahid, Abdullah (January 2023, IEEE Computer Architecture Letters)

Full Text Available
MERCURY: Accelerating DNN Training By Exploiting Input Similarity

https://doi.org/10.1109/HPCA56546.2023.10071051

Janfaza, Vahid; Weston, Kevin; Razavi, Moein; Mandal, Shantanu; Mahmud, Farabi; Hilty, Alex; Muzahid, Abdullah (February 2023, IEEE International Symposium on High-Performance Computer Architecture (HPCA))

Full Text Available
WHISTLE: CPU Abstractions for Hardware and Software Memory Safety Invariants

https://doi.org/10.1109/TC.2022.3180990

Kim, Sungkeun; Mahmud, Farabi; Huang, Jiayi; Majumder, Pritam; Tsai, Chia-Che; Muzahid, Abdullah; Kim, Eun Jung (June 2022, IEEE Transactions on Computers)

Memory safety invariants extracted from a program can help defend and detect against both software and hardware memory violations. For instance, by allowing only specific instructions to access certain memory locations, system can detect out-of-bound or illegal pointer dereferences that lead to correctness and security issues. In this paper, we propose CPU abstractions, called, to specify and check program invariants to provide defense mechanism against both software and hardware memory violations at runtime. ensures that the invariants must be satisfied at every memory accesses. We present a fast invariant address translation and retrieval scheme using a specialized cache. It stores and checks invariants related to global, stack and heap objects. The invariant checks can be performed synchronously or asynchronously. uses synchronous checking for high security-critical programs, while others are protected by asynchronous checking. A fast exception is proposed to alert any violations as soon as possible in order to close the gap for transient attacks. Our evaluation shows that can detect both software and hardware, spatial and temporal memory violations. incurs 53% overhead when checking synchronously, or 15% overhead when checking asynchronously.
more » « less
Full Text Available
XMeter: Finding Approximable Functions and Predicting Their Accuracy

https://doi.org/10.1109/TC.2020.3005083

Akram, Riad; Mandal, Shantanu; Muzahid, Abdullah (July 2021, IEEE Transactions on Computers)

Full Text Available
Communication Algorithm-Architecture Co-Design for Distributed Deep Learning

https://doi.org/10.1109/ISCA52012.2021.00023

Huang, Jiayi; Majumder, Pritam; Kim, Sungkeun; Muzahid, Abdullah; Yum, Ki Hwan; Kim, Eun Jung (June 2021, 2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA))

Large-scale distributed deep learning training has enabled developments of more complex deep neural network models to learn from larger datasets for sophisticated tasks. In particular, distributed stochastic gradient descent intensively invokes all-reduce operations for gradient update, which dominates communication time during iterative training epochs. In this work, we identify the inefficiency in widely used all-reduce algorithms, and the opportunity of algorithm-architecture co-design. We propose MultiTree all-reduce algorithm with topology and resource utilization awareness for efficient and scalable all-reduce operations, which is applicable to different interconnect topologies. Moreover, we co-design the network interface to schedule and coordinate the all-reduce messages for contention-free communications, working in synergy with the algorithm. The flow control is also simplified to exploit the bulk data transfer of big gradient exchange. We evaluate the co-design using different all-reduce data sizes for synthetic study, demonstrating its effectiveness on various interconnection network topologies, in addition to state-of-the-art deep neural networks for real workload experiments. The results show that MultiTree achieves 2.3× and 1.56× communication speedup, as well as up to 81% and 30% training time reduction compared to ring all-reduce and state-of-the-art approaches, respectively.
more » « less
Full Text Available
Learning Fitness Functions for Machine Programming

Mandal, Shantanu; Anderson, Todd; Turek, Javier; Gottschlich, Justin; Zhou, Shengtian; Muzahid, Abdullah (April 2021, Machine Learning and Systems)

The problem of automatic software generation has been referred to as machine programming. In this work, we propose a framework based on genetic algorithms to help make progress in this domain. Although genetic algorithms (GAs) have been successfully used for many problems, one criticism is that hand-crafting GAs fitness function, the test that aims to effectively guide its evolution, can be notably challenging. Our framework presents a novel approach to learn the fitness function using neural networks to predict values of ideal fitness functions.We also augment the evolutionary process with a minimally intrusive search heuristic. This heuristic improves the framework’s ability to discover correct programs from ones that are approximately correct and does so with negligible computational overhead. We compare our approach with several state-of-the-art program synthesis methods and demonstrate that it finds more correct programs with fewer candidate program generations.
more » « less
Full Text Available

« Prev Next »

Search for: All records